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Abstract 

The literature on clustering for continuous data is rich and wide; 
differently, that one developed for categorical data is still limited. In 
some cases, the problem is made more difficult by the presence of noise 
variables/dimensions that do not contain information about the clus¬ 
tering structure and could mask it. The aim of this paper is to propose 
a model for simultaneous clustering and dimensionality reduction of 
ordered categorical data able to detect the discriminative dimensions 
discarding the noise ones. Following the underlying response variable 
approach, the observed variables are considered as a discretization 
of underlying first-order latent continuous variables distributed as a 
Gaussian mixture. To recognize discriminative and noise dimensions, 
these variables are considered to be linear combinations of two inde¬ 
pendent sets of second-order latent variables where only one contains 
the information about the cluster structure while the other contains 
noise dimensions. The model specification involves multidimensional 
integrals that make the maximum likelihood estimation cumbersome 
and in some cases infeasible. To overcome this issue the parameter 
estimation is carried out through an EM-like algorithm maximizing 
a pairwise log-likelihood. Examples of application of the model on 
real and simulated data are performed to show the effectiveness of the 
proposal. 


*Department of Statistics, The Pennsylvania State University, USA 
monia.ranalliOpsu.edu 

UGF Department, University of Tor Vergata, Rome, roberto.rocci@uniroma2.it 


1 



Keywords:Mixture models, Reduction data, Ordinal data. Pair¬ 
wise Likelihood, EM algorithm 


2 



1 Introduction 


Cluster analysis aims at partitioning the data into meaningful groups which 
should differ considerably from each other. The literature on clustering for 
continuous data is rich and wide; differently, that one developed for cate¬ 
gorical data is still limited. In fact, only in the last decades there has been 
an increasing interest in clustering categorical data, although they are en¬ 
countered in many helds, such as in behavioural, social and health sciences. 
These variables are frequently of ordinal type, measuring attitudes, abilities 
or opinions, and practitioners often apply on their ranks models and tech¬ 
niques developed for continuous data. Several authors have shown how this 
procedure can give biased estimates and is dehnitely less efficient than a 
proper modelization that is able to take into account the ordinal nature of 
the data (e.g. [38]). Such models mainly adopt two approaches developed in 
factor analysis framework: IRT (Item Response Theory) and URV (Underly¬ 
ing Response Variables). In the former, the probabilities of the categories are 
assumed to be analytic functions of some latent variables having a particular 
cluster structure. The best known model is latent class analysis (LCA; |16j ) 
where the latent variable is nominal. Examples where the latent variables 
are continuous are found in [7], [ 33 ], [IS]- In the URV approach, the ordinal 
variables are seen as a discretization of continuous latent variables jointly 
distributed as a finite mixture; examples are: mi. ISl, |3H|. In both ap¬ 
proaches, the use of latent continuous variables makes the estimation rather 
complex because it requires the computation of many high dimensional in¬ 
tegrals. The problem is usually solved by approximating the log-likelihood 
function. Indeed several lines of research propose different approximations, 
but they share the same idea: replacing the full likelihood with a surrogate 
that is easier to maximize and make inference about model parameters. In 
this regard we mention some useful surrogate functions, such as the varia¬ 
tional likelihood [151 SI] or fhe pairwise likelihood ]38] to cluster categorical 
or ordinal data, respectively. Beside this, other approaches based on simu¬ 
lating the hidden variables exist. 

In some cases, the clustering problem is made more difficult by the presence of 
variables and/or dimensions (named noise) that are uninformative for recov¬ 
ering the latent groups and could obscure the true cluster structure. Different 
approaches exist in literature to identify discriminative dimensions that em¬ 
phasize group separability and give a representation of the cluster structure 
discarding the irrelevant and redundant noise dimensions. We can distin- 
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guish between variable selection and dimensionality reduction approaches. 
In the hrst we hnd proposals which aims at estimating the cluster pattern by 
selecting the set of variables which best describes the cluster structure. In 
the context of continuous data, [37] formulates the problem of variable selec¬ 
tion as a model comparison problem using the BIG, in which the variables 
are partitioned into two exclusive subsets representing the relevant, or dis¬ 
criminative, and the irrelevant, or noise, variables, respectively. inoi extend 
this approach, while 03 propose to perform the variable selection by using a 
lasso-penalty. Many other authors have extended the aforementioned works 
or proposed different approaches but almost exclusively on continuous data. 
In the context of categorical data there are only few proposals. We mention 
[To] and [16| who extend the work of 03 to the latent class model. 

On the other hand, the dimensionality reduction approach aims at discarding 
the irrelevant dimensions by identifying a reduced number of latent variables 
containing the information about the cluster structure. The easiest way to 
implement this approach is the so-called tandem analysis [T]. It is a two step 
procedure, where in the second step a clustering model/method is applied 
on a reduced number of dimensions identihed in the hrst step. Depending 
on the scale measurement of the data, the hrst step can be implemented by 
using either principal components analysis (PGA), factor analysis, PGA for 
qualitative data [IH] or multiple correspondance analysis ([IE])- Of course, it 
is difficult to hnd the discriminative dimensions without knowing the cluster 
structure. In fact, the main problem involved by tandem is that there is no 
guarantee that the reduced data obtained in step one is optimal for recover¬ 
ing the cluster structure in step two m [I])- This may hide or even distort 
the cluster structure underlying the data. As a solution to the problem, data 
reduction and clustering analysis should be performed simultaneously. In 
this way the latent factors are identihed to highlight the cluster structure 
rather than, as happens in some cases, to obscure it. Several techniques for 
simultaneous clustering and dimensionality reduction (SGR) have been pro¬ 
posed in a non-model based framework for quantitative (e.g.: [12]; [lO]) or 
categorical data (e.g.: 113 !; |Z3)- 

There are also approaches based on a family of mixture models which ht 
the data into a common discriminative subspace (see e.g. [231 E]). The key 
idea is to assume a common latent subspace to all groups that is the most 
discriminative one. This allows to project the data into a lower dimensional 
space preserving the clustering characteristics in order to improve visualiza¬ 
tion and interpretation of the underlying structure of the data. The model 
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can be formulated as a finite mixture of Gaussians with a particular set of 
constraints on the parameters. 

It is worth pointing out that SCR partially overlaps with the parsimony cri¬ 
terion. Indeed in high/dimensional context, the curse of dimensionality lead 
to dehne models capturing the essential clustering features reducing the num¬ 
ber of parameters. One of the earliest parsimonious proposal is given by the 
mixture of factor analyzers (MFA). The MFA model differs from the factor 
analysis model in having different local factor models. Conversely, the stan¬ 
dard factor analysis assumes a common factor model. The MFA to cluster 
the data and reduce locally the dimensionality of each cluster simultaneously 
was originally proposed by [13] and [T3|. Later, a general framework for the 
MFA model was proposed by P[32]. Furthermore, we point the reader to 
see also 1121 and [3| who considered the related model of mixtures of princi¬ 
pal component analyzers for the same purpose. Further references may be 
found in chapter 8 of [31] and in a recent review on model-based clustering of 
high-dimensional data [3] . As regards categorical data, we hnd few analogous 
proposals (see e.g. [IHl |33l ES, 1^). 

The aim of this paper is to propose a model for SCR on ordered categori¬ 
cal data. Following the URV approach, the observed variables are considered 
as a discretization of underlying hrst-order latent continuous variables. To 
detect noise dimensions, the latent variables are considered to be linear com¬ 
binations of two independent sets of second-order latent variables where only 
one contains the information about the cluster structure, dehning a discrim¬ 
inative subspace, while the other one contains noise dimensions. Technically, 
the variables in the hrst set are distributed as a hnite mixture of Gaussians 
while in the second set as a multivariate normal. It is important to note that 
when in the dataset there are noise variables then they tend to coincide with 
the set of second order noise latent variables. If they are not present then the 
model could be still able to identify a reduced set of second order discrimina¬ 
tive latent dimensions. This allow us to reduce the number of parameters and 
identify the main features of the clustering structure. The model specihca- 
tion involves multidimensional integrals that make the maximum likelihood 
estimation rather cumbersome and in some cases infeasible. To overcome this 
issue, the model is estimated within the EM framework maximizing the pair¬ 
wise log-likelihood, i.e. the sum of all possible log-likelihoods based on the 
bivariate marginals, as proposed in [38]. The estimators obtained have been 
proven to be consistent, asymptotically unbiased and normally distributed. 
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In general they are less efficient than the full maximum likelihood estimators, 
even if in many cases the loss in efficiency is very small or almost null [2H , 

but much more efficient in terms of computational complexity. 

The plan of the paper is the following: in the second section we present the 
model; in section 3 we describe how to take into account the presence of noise 
dimensions and/or variables; then the pairwise algorithm used to estimate 
the model parameters is presented in section 4. Section 5, 6 and 7 deal with 
model identihability issue, the output interpretation and the model selection 
problem, respectively. In section 8 a simulation study has been conducted 
to investigate the behaviour of the proposed methodology, while in section 
9 an application to real data is shown. In the last section some remarks are 
pointed out. 


2 Model 


Let Xi^X 2 i ■ ■ ■ iXphe ordinal variables and c, = 1, ..., Q the associated cate¬ 
gories for i = 1, 2,..., P. There are R = Hili possible response patterns, 
which have the following form = (xi = ci,X 2 = C 2 ,... ,xp = cp). Let y 
be the heteroscedastic latent Gaussian mixture / (y) = Pg4’ (yj ^Ig), 
where the p^’s are the mixing weights and 0 (y; Eg) is the density of a 

P-variate normal distribution with mean vector and covariance matrix 
Eg. Under the URV approach, the ordinal variables are considered as a 
discretization of y, i.e. generated by thresholding y, as follows, 

(«) , ( 2 ) 

Tci-I <yi< 7c/ ^Xi = Ci, 

where —00 = Jq'* < < ... < < Jc] = +00 are the thresholds 

dehning the Ci categories. 

Let us set V’ = {Pi, ■ ■ ■, Pu • • ■ > Pg, ^i,..., Eg, 7 } e where ^ is 
the parameter space. The probability of response pattern is given by 


Pr{xi = ci,X2 = C2,...,Xp = Cp; xj)) 


J^Pg /(,) ■■■ Jj, 0(y;p„s,)dy 

G 

9=1 
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where {ng, 7 ) is the probability of response pattern in the cluster 
g and Pg is the probability of belonging to group g subject to > 0 and 
T,%Pg = 1 - 

Thus, for a random i.i.d. sample of size N the log-likelihood is 

K r G 

= ^n^log '^Pg7rr{fig,'Sg,j) , (1) 

>■=1 Ls=i 

where rir is the observed sample frequency of response pattern r and = 

N. 

3 How to detect the presence of noise vari¬ 
ables 

Sometimes noisy dimensions are present in the data. These are dimensions 
that do not contain information about the cluster structure and could mask 
the true classes. It means that there exists a proper discriminative subspace, 
with a dimension less than the number of variables, where the clusters lie. 
In order to identify the discriminative subspace, in the previously described 
model it is assumed that there is a second order set of P latent variables y, 
which in turn is formed of two independent subsets of variables. In the hrst 
there are Q (with Q < P) variables that have some clustering information, 
while in the second set there are P — Q noisy variables. Thus, it is assumed 
that only the first Q elements of y carry any class discrimination information 
dehning the so-called discriminative subspace. Technically, the Q informative 
elements are assumed to be distributed as a mixture of Gaussians with class 
conditional means and variances equal to E{y^ \ q) = Vg Cov(y^ | g) = 
fig. The P — Q noisy elements do not have information about the cluster 
structure, it follows that they are independent of y^ and their distribution 
does not vary from one class to another. In particular we assume that E{y'^ \ 
g) = r/o and Cov(y‘^ | g) = fio. The link between the two orders of latent 
variables y and y is given by a non-singular PxP matrix A, as y = Ay. This 
means requiring a particular structure on the mean vectors and covariance 
matrices of y. The assumption of multivariate normality in each component 
provides a convenient way of specifying the parameter structure. For each 
component g, the mean vector and the covariance matrix have the following 


7 



structures, 


and 


Vg,i 


= ^(y I g) = A^(y I ^) = A 


gg,Q 
go,I 


= A 


Ig 

m 


J]0,P-Q. 


Cov(y I g) = ACov(y | g)A' 



0 

fio 


A'. 


4 Pairwise EM algorithm 


In the previous section we have seen how to reparametrize the model de¬ 
scribed in section 2 in order to identify discriminative/noise dimensions. An 
efficient way to estimate it would be through the maximization of the likeli¬ 
hood. However, the likelihood function involves multidimensional integrals, 
whose evaluation is computationally demanding as the number of observed 
variables increases. Indeed multidimensional integrals should be evaluated 
for each response pattern in the sample at several points of the parameter 
space. Thus the model estimation through a full maximum likelihood ap¬ 
proach becomes prohibitive with P greater than 5 and still demanding with 
a very low number of variables P. As suggested in [3H], the model is es¬ 
timated within the expectation-maximization (EM) framework maximizing 
a pairwise likelihood. It is a robust estimation method and its estimators 
have been proven to be consistent, asymptotically unbiased and normally 
distributed, under regularity conditions [2111311 [35]. In general they are less 
efficient than the full maximum likelihood estimators, but in many cases the 
loss in efficiency is very small or almost null [HI [2S] . 

The pairwise log-likelihood is 


P-l P Ci Cj 

= ^ ^ = X] X] Z] log 

i=l j=i+l i=l j=i+l Ci=l Cj=l 


G 


P-l P 

J2Pg^Gclit^g,'^gn) 

( 2 ) 

where now, after the reparameterization, the set of models parameters is 
= {Pi, ■■■ ,PG,'no,'ni, ■■■ ,110, ^ 0 ,^ 1 , ■■■, ^G, A, 7 }, is the observed 










frequency of a response in category Cj and Cj for variables x* and Xj re¬ 
spectively, while 7rc/c-{fig,'Sg,^) is the corresponding probability obtained 
by integrating the {i,j) bivariate marginal of the normal distribution with 
parameters {/.ig, between the given thresholds. 

Let Z denote the group membership matrix of order Ylf=i+i Ci ^ C'j) x 

G, where Zc]^Jy^g = 1 if the cell ( q , cj) belongs to component g and Zc^-g = 0 
otherwise, for g = 1,..., G. The complete pairwise log-likelihood is 


p-i 


Ci Cj 


G 


p4(tA;z,x) = 


'^CiCj^CiCj\g 


2=1 ^=2 + 1 Ci = l Cj = l g=l 


log ( ^ 9 ) 7 ) ) + log ipg) 


The E-step requires the computation of the expected value of the complete- 
data pairwise log-likelihood given the current estimates of the model param¬ 
eters. This is given by 



[p4('»/’;z,x|x) 
Ci Cj G 


(o) z(o)(t) 

Ci Ci ^Ci C 


i<j Ci = l Cj = l g=l 


■g log {fig, Sg, 7)) -h log (Pg) , (3) 


where 


^CiCj;g 




v(P) 

CiCj;g 


1 \ Xi Ci^ Xj 



Pr 


1 ) 


^(ij) 


5 9 


Ci, X i — C/j 


In the M-step we maximize the complete pairwise log-likelihood function 
subject to some constraints that will be specified in the sequel. The previous 
expected value is maximized with respect to the model parameters. Looking 
at the expected value in (3), the maximization can be decomposed in two 
parts: the former corresponds to the component parameters {fig, Tig) and 
thresholds 7 , the second one to the mixture weights Pg’s. The first part of 
the M-step has not a closed form; hence, to obtain the estimates, its maxi¬ 
mization has been implemented in Matlab by using the command “fmincon” 
|29] under some constraints that are explained in detail in section 5. 

On the other hand, the estimate of component weight pg has a closed 
form and they are easily carried out as follows, 

y^Ci y^O „(u) -(o)(*) 

/ -ig—l 2-^Cj=l E-iCj '^CiCj]g 
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with = 1, • • • , G. 

In order to ensure the positive-dehniteness of the covariance matrices we 
estimate them through their Cholesky decomposition. It means that the ob¬ 
jective function is maximized with respect to rather than fig where the Ts 
are upper triangular matrices such that for = 0,1,..., G. Fi¬ 

nally, the threshold parameters do not change over the components, but each 
component is characterised by a different set of parameters; now, standardiz¬ 
ing each component by making a change of variable, i.e. Zi = {Vi — 
we obtain new integration limits changing over the components. These are 
dehned as 



This allows to compute the probability of a response in category Cj and Cj 
for variables Xi and Xj, respectively, in (3) as 



where <I> 2 (a, &;p) is the bivariate cumulative standard normal distribution 
with correlation p evaluated at the threshold parameters a and b. As re¬ 
gards the classihcation, in [3H] it has been suggested to use an Iterative 
Proportional Fitting based on the pairwise posterior probabilities obtained 
as output of the pairwise EM algorithm in order to approximate the joint 
posterior probabilities. 


5 Model identifiability 


Model identifiability is a crucial issue, especially when latent variables are 
involved in conjunction with ordinal data. The necessary conditions to iden¬ 
tify a mixture model for ordinal data using a pairwise likelihood approach 
are discussed in detail in [38]. Here we report only the necessary condition 
needed to identify the SCR model. We recall that the pairwise likelihood is 
obtained by the product of all bivariate marginal likelihood contributions and 
thus the maximum number of estimable parameters is equal to the number of 
non redundant parameters involved in the bivariate marginals. This equals 
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the number of parameters of a log linear model with only two factor interac¬ 
tion terms. As a consequence, given a Ci x C 2 x ... x Cp contingency table 
a necessary condition for the identihability of a model is that the number of 
model parameters is at most 

p p-i p 

^(C-,-l) + 5; ^(C,-l)(C,-l). (5) 

2=1 2=1 J''=2+l 


Furthermore, under the URV approach, the means and the variances of the 
hrst order latent variables are hxed to 0 and 1, respectively, because they are 
not identihed. In [3H], the authors set the means and the variances of the 
reference component to 0 and 1, respectively. This identihcation constraint 
individualizes uniquely the mixture components (ignoring the label switch¬ 
ing problem), as well described in [31]. This is sufficient to estimate both 
thresholds and component parameters if all the observed variables have three 
categories at least and when groups are known. As described in the following, 
given the particular structure of the mean vectors and covariance matrices, it 
is preferable to adopt an alternative (but equivalent) parametrization. This 
is analogous to that one used by [22] ; it consists in setting the hrst two 
thresholds to 0 and 1, respectively. This means that there is a one-to-one 
correspondence between the two sets of parameters. 

Some other parameters in the covariance matrices can be set to a specihed 
value without loss of generality. To see this, let us consider a generic con- 
hguration for the model parameters "0. A is a non-singular P x P matrix; 
this can be decomposed into two sub-matrices A = [Ai,A 2 ] such that the 
covariance matrix Si can be written as 


Si 


A 


fti 0 

0 Oo 

ill 0 
0 0 

Aif^iA^ “I" A2f^oA2. 


A' 


A' +A 


0 

Oq 


A' 


In factor analysis, it is well known that there exist non-singular matrices Si 
and S 2 such that AiOiA) = ViV) and A 2 O 0 A 2 = V 2 V 2 , where Ai = ViSi 
and A 2 = V 2 S 2 . The matrices Vi and V 2 have a particular structure. 
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Assuming that P = 5 and Q = 3, Vi is of order 5x3 and it looks like 


Vll 

0 

0 

V21 

V22 

0 

V31 

V32 

V33 

U41 

Va2 

Va 3 

t ^51 

V52 

V53 


while V 2 is of order 5x2 and it looks like 


vn 

0 

V21 

V22 

V31 

V32 

V41 

V42 

V51 

V52 


In other words, Vi and V 2 have a lower triangular matrix in the hrst Q and 
(P — Q) rows, respectively. 

As regards with g = 2,... ,G it follows that 

Sg = + A2fioA2 

= ViSi0^s;v; + v2v'2 
= Vio;v; + V2V'2. 

Finally, the factorization shown above does not create any problem on the 
structure of the mean vectors. Indeed, we observe that 


^^9 


[Vi 



ViSi 



82^/0 



V2S2; 


Vg 

Vo 




1 


where S 2 is a matrix of order {P — Q) x [P — Q). Thus the number of param¬ 
eters needed to estimate the model with Q variables carrying classification 
power, Q noisy variables and G components is given by 


P 

G-l +Q{Q+ l)/2 + Q{P - Q) + {G - 1)Q{Q+ l)/2 + (P - Q)(P - Q+ l)/2 + Q(P - Q)+ GQ + P - Q + ^ C, - 3P. 

Vi V 2 —'■-' 

thresholds 


This should be less or equal to the maximum number needed to saturate a 
log linear model with two factor interaction terms in (5). 
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6 Interpretation of matrix A 

As said previously, the ordinal variables are assumed to be a partial man¬ 
ifestation of first-order latent variables, which are a linear combinations of 
second-order latent variables. The main role of matrix A is to specify the 
coefficients of these linear combinations and thus, to identify the noisy vari¬ 
ables. However this arises some interpretation issues, as it occurs in a factor 
analysis framework. The solutions provided in literature are different, but 
they share the same idea: yielding a sparse (simple) matrix to have an eas¬ 
ier interpretation. To this aim, varimax and oblimin are the most popular 
types of rotation frequently used in the orthogonal and non-orthogonal cases, 
respectively. There exist many ways for creative thinking on a easier inter¬ 
pretation. For the current proposal we could apply a varimax rotation on 
A 2 and Ai. 

Furthermore, the matrix A plays a central role in estimating the correla¬ 
tion between the latent variables of hrst and second orders, whose covariance 
matrix is given by Y) = ASy; we remark that Cov{Y) accounts 

for both the within and the between variance of the mixture. The observed 
variables that are most correlated with variables are identihed as noise 
variables. 


7 Model Selection 

In the estimation procedure, we assume that both the number of mixture 
components and the number of noisy variables are hxed. In practice, they 
are unknown and thus, they must be estimated through observed data The 
best htted model is chosen by selecting the model minimizing the C-BIC, 
introduced by [12]. 

C-BIC = x) + tr log N. (6) 

where H is the sensitivity matrix, H = i?(—x)) while V is the vari¬ 
ability matrix (the covariance matrix of the score vector), V = I/ar(Vpf'('i/>; x)) 
The C-BIC has the same structure of BIC; the only difference is the way used 
to account for the model complexity. The BIC penalizes the likelihood by 
the term d\ogN, where d is the total number of essential parameters. On 
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the other hand C-BIC penalizes the likelihood by tr ^Vj logiV. In this 

case, the identity H = —V does not hold, since the likelihood components 
are not independent (differently from the fnll likelihood theory). However, if 

H = —V, then tr wonld be eqnal to d. Sample estimates of H and 

V for the model proposed are 

H = -^Ef=i^r-VV(^;x) 

and 

^ = ^Ef=i^r-(Vp£(tA;x))(Vp£(^;x))'. 

A simnlation stndy testing its performance in a context of mixtnre models 
has been provided in |3H1 EH] • In the cnrrent work, in order to obtain the em¬ 
pirical estimates of the sensitivity and variability matrices, we have nsed the 
same nnmerical approximation techniqne described there. More precisely, 
the derivatives are estimated by finite differences. As regards the variability 
matrix a covariance matrix of the score fnnction has been estimated for each 
response pattern. Compntationally speaking it has been obtained by mnlti- 
plying a matrix inclnding the score fnnctions for each response pattern times 
a diagonal matrix with the freqnencies Ur on the main diagonal times the 
first matrix transposed. As regards the sensitivity matrix, we know from the 
theoretical resnlts of the pairwise that each snb-likelihood (each component 
of the pairwise likelihood) is a trne likelihood. This means that the second 
Bartlett’s identity holds. This allows ns to estimate the sensitivity matrix in 
the same fashion as before. However in this case the diagonal matrix has the 
freqnencies rixiXj on the main diagonal and the score fnnctions refer to each 
response pattern for each pair of variables. Finally, the trace is obtained by 
snmming the generalized eigenvalnes of the two matrices, i.e. by solving the 
eqnation Vx = AHx. This allows to avoid inverting the sensitivity matrix, 
that may be imprecise and nnstable. 

8 Simulation study 

To evalnate the empirical behavionr of the proposal, a large-scale simulation 
study has been conducted. The performance has been evaluated in terms 
of recovering the true cluster structure using the following measures: the 
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loss measure (L) between the estimated and the true model and the Ad¬ 
justed Rand Index (ARI) [20] between the true hard partition matrix and 
that estimated. The former compares clusterings by set matching and it 
is given by the quadratic mean of the differences between the true and the 
estimated posterior probabilities. Since, label-switching plays an important 
role, we compute it for every possible permutation of the cluster membership 
labels of the resulting partition of N individuals and we choose the minimum 
value obtained. A smaller value clearly indicates a better performance with 
0 < L < 1. The second index can be considered a hard classihcation measure, 
while the former a fuzzy index. Given two different hard classihcation matri¬ 
ces, W and W, i.e. binary row matrices according to which observations are 
assigned to only one cluster, the ARI counts the pairs of observations that 
are assigned to the same or different clusters under both partition matrices 
and it is dehned as 


ARI(W, W) 


R{W,W) - E{R{W,W)) 
1-E{R(W, W)) 


(7) 


where 

R{W,W) = ^^^, 

[2) 

where R(W, W) is the Rand Index, W and W are the true and the estimated 
partition matrices respectively, A^n is the number of pairs of observations in 
the same cluster under W and W and A'"oo is the number of pairs in different 
clusters under W and W; N is the sample size. The index has expected 
value zero for independent clusterings and maximum value 1 for identical 
clusterings. 

Eight different scenarios have been considered under the presence or not of 
noise variables. In both cases we simulated 250 samples from a latent two- 
component Gaussian mixture model. However, in the hrst case we simulated 
hve ordinal variables with hve categories, but we assumed that only two 
{Q = 2) variables carry group discrimination information, the others are 
noise variables. In the second case, we assumed that three variables are less 
informative about cluster structure. Nevertheless, their means and variances 
still change across the groups (differently from the assumptions of the SGR 
model). 

Under these two broad conditions, we have analysed four scenarios consider¬ 
ing two different experimental factors: the sample size (A^ = 1000, 5000) and 
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the separation between clusters (well separated or not). 

Given the simulated ordinal data, we compared the performances of the SCR 
model with the standard clustering model proposed in [38]. The parameter 
estimates were carried out through a pairwise EM algorithm, that has been 
initialized using rational starting points. In other words, we hrst htted a 
Gaussian mixture model, treating the ranks as continuous. Then, we used 
its output properly. The algorithms were stopped when the increase in the 
asymptotic estimate log-likelihood between two consecutive steps was less 
than 10“^. 

In the sequel, we analyze the simulation output in the case in which three 
noise variables exist; then, we analyze the case in which three variables are 
less informative about the cluster structure. This section ends with a com¬ 
parison between these two main conditions. 

Below, we report the true values used to generate the data according to the 
SCR model, i.e. the case in which there are three noise variables. 
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Table 1: True values of the latent mixture model and thresholds under dif¬ 
ferent scenarios. The data were generated according the structure assumed 
by the SCR model. 

Common parameters in terms of A, 77 and ii ~ 


Component weights 
Means of noisy variables 

Covariance matrices 

Thresholds 


PI = 0.3 




m - [ 0 , 0 , 0 ]' 

Hq — 

for each variable:[0, 1,2,3] 


Separated groups 


Parametrization in terms of A, tj and il 


rii = [-2.24, 4.47]' 


_ [1.25 0.75' 

"2 - [ 0.75 1.25 


[^ 

0 

0 

0 

. 0 


T72 = [-2.80, 0.56]' 

0 0 0 

^ 0 0 

0 ^TS 0 

0 0 VTTs 

0 0 0 


0 ■ 
0 
0 
0 

yiTs. 


Parametrization in terms of /i and S 




[-2, 

0 

0 



[2.5 

0 

ca 

0 

0 



■0.8 

0 

0 

0 

0 ■ 


-1.0 

0.6 

0 

0 

0 - 


0 

0.8 

0 

0 

0 


0.6 

1.0 

0 

0 

0 

S 

0 

0 

1.5 

0 

0 


0 

0 

1.5 

0 

0 


0 

0 

0 

1.5 

0 


0 

0 

0 

1.5 

0 


. 0 

0 

0 

0 

1.5. 


. 0 

0 

0 

0 

1.5. 




S 


JN on-separated groups 

Parametrization in terms of A, tj and il 


0 ■ 
0 
0 
0 

VTs. 


Parametrization in terms of /x and S 


[-0.5,3.5,0, 0, 0]' [2.5, 0.5, 0,0,0]' 


-1.5 

0 

0 

0 

0 - 


- 3.30 

1.95 

0 

0 

0 - 

0 

1.5 

0 

0 

0 


1.95 

3.30 

0 

0 

0 

0 

0 

1.5 

0 

0 


0 

0 

1.5 

0 

0 

0 

0 

0 

1.5 

0 


0 

0 

0 

1.5 

0 

. 0 

0 

0 

0 

1.5. 


. 0 

0 

0 

0 

1.5. 


771 = [-0., 403, 2.86]' 


^2 


2.3 

1.3 


1.3 

2.2 


T72 = [2.04, 0.41]' 

yiTS 0 0 0 

0 VTs 0 0 

0 0 ^175 0 

0 0 0 yiTs 

0 0 0 0 
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Figure 1: Box-plots of ARI for the posterior probabilities. Data generated 
from a two-component latent mixture; 5 ordinal variables with 5 categories; 
three of them are noisy variables. N=1000,5000. Separated/non-separated 
groups. 250 samples. 


Separated groups. N-1000 Separated groups. NsSOOO 



All simulation results are reported in the appendix. Figures 1 and 2 show 
the distributions of the adjusted rand index and loss measure, respectively, 
in the four different scenarios. On the left side the sample size is equal to 
1000, while on the right one is equal to 5000; in the hrst row the groups 
are separated, while in the second one the groups are not separated. To be 
more clear and to have more comparable results, the range of the y-axis has 
been cut ([0.5, 1] and [0, 0.5] for the adjusted rand index and loss measure, 
respectively). The pairwise estimators shows consistency: as N increases we 
obtain better classihcation performance and the variances of ARI and loss 
are smaller. Furthermore, the clustering performance becomes poorer as the 
components are less separated. Comparing the two htted models, we observe 
that SRC outperforms the pairwise clustering in all scenarios, as expected. 
However, the gap in performance depends on the specihc scenario. In general, 
the gap seems to increase when the groups are less separated and the sample 
size is smaller. 

Now, we report the simulation results for the case in which the data were 
not generated according the structure assumed by the SCR model. They 
were assumed to be a categorization of a latent two-component Gaussian 
mixture model, whose true parameters are reported below. There are three 
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Figure 2: Box-plots of LOSS for the posterior probabilities. Data generated 
from a two-component latent mixture; 5 ordinal variables with 5 categories; 
three of them are noisy variables. N=1000,5000. Separated/non-separated 
groups. 250 samples. 


Separated groups. N-1000 Separated groups. NsSOOO 



less informative variables; they are less informative in the sense that their 
means and variances change slightly over the components. In other word, 
based on these variables, the two components are almost totally overlapped. 
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Table 2: True values of the latent mixture model and thresholds under dif¬ 
ferent scenarios. The data were generated by thresholding a latent two- 

component Gaussian mixture model. 

Separated groups 

_Pi = 0.3_P2 = 0.7_ 


a 


[-2,4, 

0, - 

0.5,0]' 



[2.5,0.5,0.5,0,0.51' 



-0.8 

0 

0 

0 

0 - 


-1.25 

0.75 

0 

0 

0 - 


0 

0.8 

0 

0 

0 


0.75 

1.25 

0 

0 

0 

s 

0 

0 

1.5 

0 

0 


0 

0 

1.0 

0 

0 


0 

0 

0 

1.5 

0 


0 

0 

0 

1.0 

0 


. 0 

0 

0 

0 

1.5_ 


. 0 

0 

0 

0 

1.0. 


Thresholds for each variable:[0,1, 2, 3] 

Non-separated groups 

_ Pi = 0.3_ p2 = 0.7 


a 

[■ 

d 

1 

3.5,0, 

-0.5,0]' 


[2.5, 0.5, 0.5,0,0.5]' 


-1.5 

0 

0 

0 

0 - 


-2.2 1.3 

0 

0 0 - 


0 

1.5 

0 

0 

0 


1.3 2.2 

0 

0 0 

s 

0 

0 

1.5 

0 

0 


0 0 

1.0 

0 0 


0 

0 

0 

1.5 

0 


0 0 

0 

1.0 0 


. 0 

0 

0 

0 : 

L.5_ 


O 

o 

0 

0 1.0. 


Thresholds for each variable:[0,1, 2, 3] 


All simulation results are reported in the appendix. Figures 3 and 4 show 
the distributions of the adjusted rand index and loss measure, respectively, 
in the four different scenarios. Once again, the pairwise estimators shows 
consistency. As the degree of overlap between components increases, the 
performances worsen. Comparing the two htted models, the only scenario in 
which their performances are almost the same is the easiest, i.e. when the 
groups are separated and the thresholds are equidistant. In all other scenar¬ 
ios, it seems that the presence of three less informative variables mask the 
cluster structure, and therefore this is not successfully recovered by the pair¬ 
wise clustering model. Conversely, the SCR model recognizes the presence of 
some noise dimensions and identifies the two variables carrying the discrim¬ 
inative classihcation information, using less parameters (in other words it a 
more parsimonious model). This leads to better results in terms of clustering 
performances. 

Finally, we compare briefly the two main conditions: the existence of noise 
variables versus the existence of less informative variables. When there are 
less informative variables (i.e. looking at Figures 3 and 4) we note that the 
performances of pairwise clustering model improve, compared to Figures 1 
and 2. This is somehow expected, since in the last case, even if the cluster 
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Figure 3: Box-plots of ARI for the posterior probabilities. Data generated 
from a latent two-component mixture model; 5 ordinal variables with 5 
categories; three of them are less informative about the cluster structure. 
N=1000,5000. Separated/non-separated groups. 250 samples. 


Separated groups. N-1000 Separated groups. NsSOOO 



Figure 4: Box-plots of LOSS for the posterior probabilities. Data gener¬ 
ated from a latent two-component mixture model; 5 ordinal variables with 
5 categories; three of them are less informative about the cluster structure. 
N=1000,5000. Separated/non-separated groups. 250 samples. 


Separated groups. N-1000 Separated groups. N-5000 
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structure could be masked, there is no mis-specification between the gener¬ 
ating data process and the htting model. Nevertheless, due to the presence 
of some less informative variables makes its performances, it is still outper¬ 
formed by the SCR model. On the other hand, this shows some degree of 
robustness for the SCR model; in other words, even if the data were generated 
from a mis-specihed model, this does no effect its performances. 


9 Application to Real Data 

In this section the proposed modelling methodology is applied to a real 
dataset. 


9.1 General Social Survey dataset 

To illustrate how the model can be used we apply it to a set of data taken 
from the General Social Survey and displayed in Table 3. This is a well known 
dataset in educational held, analysed by im and re-analysed recently by na 
and |38]. It is a three-way cross-classihcation table of 1,517 people on three 
ordinal variables: happiness (3 categories), years of completed schooling (4 
categories), and number of siblings (5 categories). 


Table 3: Three-way cross-classihcation of U.S. sample according to their 
reported happiness, years of schooling and number of siblings 


Number of Siblings 

Year of School 


Completed 

0-1 

2-3 

4-5 

6-7 

8-h 



Not too Happy 


< 12 

15 

34 

36 

22 

61 

12 

31 

60 

46 

25 

26 

13-16 

35 

45 

30 

13 

8 

n+ 

18 

14 

3 

3 

4 



Pretty Happy 


< 12 

17 

53 

70 

67 

79 

12 

60 

96 

45 

40 

31 

13-16 

63 

74 

39 

24 

7 

n+ 

15 

15 

9 

2 

1 



Very Happy 


< 12 

7 

20 

23 

16 

36 

12 

5 

12 

11 

12 

7 

13-16 

5 

10 

4 

4 

3 

n+ 

1 

2 

9 

0 

1 
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Figure 5: Heat map of posterior probabilities 
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We initialized the pairwise EM algorithm considering 100 different ran¬ 
dom starting points. We run 9 different scenarios varying both the number 
of clusters G = 1,2,3 and the number of variables with classihcation power 
Q = 1,2, 3. All models with G greater than 3 cannot be identihed. The hnal 
model is chosen by minimizing the C-BIC. 

The best htted model is given by G = 2 and Q = I (see Table 2), with the 
component weights equal to 0.28 and 0.72, respectively. Figure 3 represents 
the posterior probabilities to belong to the largest component. It is worth 
noting that there is a clear classihcation between the two groups as the num¬ 
ber of completed years of schooling increases. Moreover it is interesting to 
note that years of completed schooling is the only variable with discrimina¬ 
tive power, since the posterior probabilities do not change substantially over 
the levels of happiness or the number of siblings. 

The correlation between the hrst- and second-order variables (by rows 
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Table 4: Model choice according to the composite information criteria C- 
BIC. _ 



G=1 

G=2 

0 

II 

OS 

Q=1 

24717 

22848 

22890 

Q=2 

23151 

22881 

22891 

Q=3 

22937 

22896 

22972 


and by columns, respectively) is 


0.9997 

-0.0797 

-0.0094 

-0.4763 

0.8977 

-0.0988 

-0.1740 

0.1015 

0.9824 


This leads to some straightforward conclusions: to detect the noisy vari¬ 
ables we should look at the highest correlation on the last two columns 
{y 2 ,y 3 )- The most correlated variables are y 2 and yz with correlations equal 
to 0.90 and 0.98, respectively. 

Furthermore, in order to test the right behaviour of our proposal, in the orig¬ 
inal dataset we have included a noisy ordinal variable with three categories 
obtained by thresholding a standard normal variable. As expected the best 


Table 5: Model choice according to the composite information criteria C- 
BIC. _ 



G=1 

G=2 

G=3 

Q=1 

44407 

44133 

44151 

Q=2 

44719 

44182 

44166 

Q=3 

44423 

44162 

44186 

Q=4 

44809 

44219 

44313 


htted model is that one minimizing C-BIC, that is the model with G = 2 
and Q = 1 with a C-BIC value of 44133. 

The correlation between the hrst- and second-order variables (by rows and 
by columns, respectively) is 

0.9986 ; -0.1726 -0.0259 -0.0096 ' 

" To.4800^ “o”.9286 ” “ 0.02“86“ “ 0.0019“ " 

-0.2157 I 0.0349 0.9816 0.0038 
-0.0580 I -0.0119 0.0260 0.9985 
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This leads to some straightforward conclusions: to detect the noisy vari¬ 
able we should look at the highest correlation on the last three columns 
(^ 2 , ^ 3 , Vi)- The most correlated variables are 2 / 2 , 2 / 3 , Vi with correlations equal 
to 0.93,0.98 and 1, respectively. 


10 Concluding Remarks 

In this paper an extension of the model proposed by [IIII251EH] has been 
introduced. The proposal allows to select the variables that are signihcant 
for clustering. Indeed in many applications, it is possible that only some vari¬ 
ables have classification power. From a statistical modelling point of view, 
this means requiring a particular structure for the means and the covariance 
matrices. Following the URV approach the ordinal variables are considered a 
partial manifestation of first-order latent variables. To detect the presence of 
noisy variables and/or dimensions, these are assumed to be linear combina¬ 
tions of two independent sets of second-order latent variables. Such proposal 
reduces and clusters ordinal data simultaneously. Nevertheless if there is no 
noisy variable, but only noisy dimensions, it reduces to a more parsimonious 
mixture model to cluster ordinal data (compared to the proposals existing in 
literature). Whatever the structure is (apart from the independence case), 
the full likelihood always involves multidimensional integrals that cannot be 
computed in a closed form. For this reason, the parameter estimation is 
carried out through the maximization of an easier surrogate function, that 
is the pairwise likelihood. In order to classify the observations, the posterior 
probabilities are re-constructed through the IFF algorithm. After exploring 
the effectiveness of the proposal through a large-scale simulation study, an 
application to real dataset has been analysed. To validate the proposal, a 
further experiment has been conducted: an ordinal noisy variable has been 
added to the original General Social Survey dataset. In all cases the best 
fitted model has been chosen by minimizing the information criterion C-BIC. 
Even if the proposal seems to be promising, there are some open issues. For 
example, in the current work we do not provide a graphical representation 
of the output in a reduced space. It is not straightforward for two main rea¬ 
sons: ordinal variables do not have a friendly graphical representation and 
furthermore, there exist two different orders of latent variables. However, 
this challenge gives us motivation for further research. 
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Appendix 

10.1 Data generated from the SCR model 


Table 6: Simulation results: ARI and loss for the posterior probabilities. 
Data generated from a two-component latent mixture; 5 ordinal variables 
with 5 categories; three of them are noisy variables. Separated groups. 
N=1000 and R=250 samples. 


Adjusted Rand Index 



Mean 

St.Dev 

q=0.025 

q=0.25 

q=0.5 

q=0.75 

q=0.975 

Pairwise C 
Pairwise SCR 

0.7518 

0.9970 

0.2478 

0.0040 

0.5446 

0.9958 

0.7543 

0.9959 

0.8436 

1.0000 

0.9004 

1.0000 

0.9385 

1.0000 


Loss 



Mean 

St.Dev 

q=0.025 

q=0.25 

q=0.5 

q=0.75 

q=0.975 

Pairwise C 
Pairwise SCR 

0.2230 

0.0190 

0.1247 

0.0143 

0.1216 

0.0032 

0.1560 

0.0082 

0.1908 

0.0182 

0.2305 

0.0283 

0.3016 

0.0333 


Table 7: Simulation results: ARI and loss for the posterior probabilities. 
Data generated from a two-component latent mixture; 5 ordinal variables 
with 5 categories; three of them are noisy variables. Separated groups. 
N=5000 and R=250 samples. 


Adjusted Rand Index 



Mean 

St.Dev 

q=0.025 

q=0.25 

q=0.5 

q=0.75 

q=0.975 

Pairwise C 
Pairwise SCR 

0.9300 

0.9985 

0.0178 

0.0013 

0.9204 

0.9975 

0.9251 

0.9984 

0.9283 

0.9984 

0.9323 

0.9992 

0.9353 

1.0000 


Loss 



Mean 

St.Dev 

q=0.025 

q=0.25 

q=0.5 

q=0.75 

q=0.975 

Pairwise C 
Pairwise SCR 

0.1326 

0.0174 

0.0229 

0.0065 

0.1288 

0.0109 

0.1331 

0.0148 

0.1365 

0.0178 

0.1394 

0.0200 

0.1439 

0.0235 


26 



Table 8: Simulation results: ART and loss for the posterior probabilities. 
Data generated from a two-component latent mixture; 5 ordinal variables 
with 5 categories; three of them are noisy variables. Non-separated groups. 
N=1000 and R=250 samples. 


Adjusted Rand Index 



Mean 

St.Dev 

q=0.025 

q=0.25 

q=0.5 

q=0.75 

q=0.975 

Pairwise C 
Pairwise SCR 

0.3696 

0.8722 

0.2519 

0.0649 

0.0619 

0.8544 

0.2071 

0.8685 

0.4022 

0.8809 

0.5364 

0.8915 

0.6481 

0.9002 


Loss 



Mean 

St.Dev 

q=0.025 

q=0.25 

q=0.5 

q=0.75 

q=0.975 

Pairwise C 
Pairwise SCR 

0.3968 

0.1517 

0.1194 

0.0317 

0.2874 

0.1340 

0.3216 

0.1414 

0.3646 

0.1475 

0.4345 

0.1536 

0.5237 

0.1655 


Table 9: Simulation results: ARI and loss for the posterior probabilities. 
Data generated from a two-component latent mixture; 5 ordinal variables 
with 5 categories; three of them are noisy variables. Non-separated groups. 
N=5000 and R=250 samples. 

Adjusted Rand Index 

Mean St.Dev q=0.025 q=:0.25 q=0.5 q=0.75 q=0.975 


Pairwise C 0.7276 0.1034 0.6850 0.7255 0.7539 0.7731 0.7914 

Pairwise SCR 0.8823 0.0086 0.8736 0.8789 0.8825 0.8858 0.8906 


Loss 



Mean 

St.Dev 

q=0.025 

q=0.25 

q=0.5 

q=0.75 

q=0.975 

Pairwise C 
Pairwise SCR 

0.2483 

0.1407 

0.0393 

0.0050 

0.2200 

0.1357 

0.2307 

0.1388 

0.2409 

0.1405 

0.2518 

0.1426 

0.2686 

0.1458 
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10.2 Data generated from a misspecified model 


Table 10: Simulation results: ARI and loss for the posterior probabilities. 
Data generated from a two-component latent mixture; 5 ordinal variables 
with 5 categories; three of them are less informative. Separated groups. 
N=1000 and R=250 samples. 


Adjusted Rand Index 



Mean 

St.Dev 

q=0.025 

q=0.25 

q=0.5 

q=0.75 

q=0.975 

Pairwise C 
Pairwise SCR 

0.8970 

0.9950 

0.1857 

0.0044 

0.8470 

0.9918 

0.9392 

0.9919 

0.9672 

0.9959 

0.9837 

0.9959 

0.9918 

1.0000 


Loss 



Mean 

St.Dev 

q=0.025 

q=0.25 

q=0.5 

q=0.75 

q=0.975 

Pairwise C 
Pairwise SCR 

0.1152 

0.0269 

0.1098 

0.0132 

0.0360 

0.0116 

0.0556 

0.0207 

0.0839 

0.0282 

0.1155 

0.0327 

0.1717 

0.0400 


Table 11: Simulation results: ARI and loss for the posterior probabilities. 
Data generated from a two-component latent mixture; 5 ordinal variables 
with 5 categories; three of them are less informative. Separated groups. 
N=5000 and R=250 samples. 


Adjusted Rand Index 



Mean 

St.Dev 

q=0.025 

q=0.25 

q=0.5 

q=0.75 

q=0.975 

Pairwise C 
Pairwise SCR 

0.9908 

0.9954 

0.0087 

0.0019 

0.9877 

0.9934 

0.9918 

0.9951 

0.9934 

0.9959 

0.9943 

0.9967 

0.9959 

0.9975 


Loss 



Mean 

St.Dev 

q=0.025 

q=0.25 

q=0.5 

q=0.75 

q=0.975 

Pairwise C 
Pairwise SCR 

0.0381 

0.0275 

0.0142 

0.0053 

0.0274 

0.0222 

0.0317 

0.0253 

0.0355 

0.0275 

0.0388 

0.0300 

0.0473 

0.0328 



Table 12: Simulation results: ARI and loss for the posterior probabilities. 
Data generated from a two-component latent mixture; 5 ordinal variables 
with 5 categories; three of them are less informative. Non-separated groups. 
N=1000 and R=250 samples. 


Adjusted Rand Index 



Mean 

St.Dev 

q=0.025 

q=0.25 

q=0.5 

q=0.75 

q=0.975 

Pairwise C 
Pairwise SCR 

0.4382 

0.8817 

0.2819 

0.0643 

0.0673 

0.8571 

0.2324 

0.8777 

0.5563 

0.8915 

0.6501 

0.9046 

0.7172 

0.9165 


Loss 



Mean 

St.Dev 

q=0.025 

q=0.25 

q=0.5 

q=0.75 

q=0.975 

Pairwise C 
Pairwise SCR 

0.3662 

0.1497 

0.1273 

0.0310 

0.2442 

0.1280 

0.2752 

0.1369 

0.3200 

0.1454 

0.4323 

0.1546 

0.5176 

0.1648 


Table 13: Simulation results: ARI and loss for the posterior probabilities. 
Data generated from a two-component latent mixture; 5 ordinal variables 
with 5 categories; three of them are less informative. Non-separated groups. 
Equidistant thresholds. N=5000 and R=250 samples. 


Adjusted Rand Index 



Mean 

St.Dev 

q=0.025 

q=0.25 

q=0.5 

q=0.75 

q=0.975 

Pairwise C 
Pairwise SCR 

0.5390 

0.9050 

0.2498 

0.0114 

0.1263 

0.8950 

0.6519 

0.8997 

0.6762 

0.9055 

0.6944 

0.9102 

0.7083 

0.9161 


Loss 



Mean 

St.Dev 

q=0.025 

q=0.25 

q=0.5 

q=0.75 

q=0.975 

Pairwise C 
Pairwise SCR 

0.3176 

0.1359 

0.0977 

0.0076 

0.2466 

0.1285 

0.2581 

0.1324 

0.2725 

0.1358 

0.2864 

0.1390 

0.4607 

0.1428 
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